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Industry, Engineering and Science 



TMs manual outlines nine statistical techniques , giving simple 
definitions and examples, a summary of input and output, and 
references to numerous applications and computer programs. 
The techniques covered are: correlation, factor analysis, 
cluster analysis, regression, discriminant analysis , 
experimental analysis, evolutionary operation, Bayes 
formula, and time series analysis. 
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INTRODUCTION 



We need the quantitative methods of statistics both to clarify and to solve 
problems in evety science, engineering discipline, and industrial 
enterprise. 

This manual spotlights nine statistical techniques. They are general- 
purpose tools and attack problems in which many variables or factors 
operate simultaneously. They thrive where data are highly variable and 
where no neat, determinate mathematical model is known. Widely di- 
vergent groups — behavioral scientists, electrical engineers, steel 
manufacturers — use these techniques. 

Computers handle statistical techniques with great speed and accuracy. 
Computers process many measurements on a great number of factors; 
allow easy experimentation and analysis. 

We cover the nine techniques in a short, simple manner. A thorough 
familiarity with statistics is not required to read this manual. The 
essential feature of a technique, the wide applicability, the perspective 
— these are made plain without cautions, hedgings, assumptions, or 
mathematical precision. Statistics is not taught here, or computer 
programming, or the subject matter implied in the illustrations (whether 
in geology, aeronautical engineering, or petroleum refining). However, 
for each technique area this manual: 

• gives a simple definition 

• illustrates its application 

• tells the type of data you begin with (Inputs) and 
what answers you end with (outputs) 

• references numerous applications 

• lists some available computer programs. 

Remember in each of the nine sections to follow that the statistical tech- 
nique is only loosely defined. Generally, a technique has to satisfy many 
conditions to be validly used, and is not infallible in effect. However, 
some techniques are now controlling huge industrial operations. Others 
are providing researchers new insights Into what were once complicated 
or puzzling situations. 

Three kinds of listings accompany the majority of the techniques: 

• a file of applications with references 

• a list of computer programs in the "IBM Catalog 
of Programs" series 

• a list of references which cite the use of IBM 
Systems in carrying out a technique. 

The first item above covers applications and examples, some tutorial, 
some in actual operation. 



The second item cites computer program.s for the techniques covered, 
as well as for related methods. Each computer system has a program 
library contributed to by IBM customers or IBM personnel. The so-called 
Type I and Type II programs are supported by documentation and test 
procedures. IBM serves solely as the distribution agent for Tjrpe in and 
lype IV programs. More details on each program referenced can be 
found in the "IBM Catalog of Programs" appropriate to a machine (avail- 
able through a local IBM Branch Office). There are, of course, hundreds 
of statistical programs in use other than those in the IBM Catalog. 

The third item provides a lead to other possible sources of programs. 

References are to be foimd in alphabetic order in the back of this manual. 
A citation is often made to a later source rather than to the original one. 
At times a judgment about the use of a technique is based only on a title 
or an abstract, "Computer Abstracts" and "The H. W. Wilson Company 
Indexes" proved useful in gathering raw data on the use of a computer 
or a technique. 

Readers are encouraged to recommend other techniques, and contribute 
new citations to applications or programs. Write: Technical Publications, 
IBM Corporation, Data Processing Division, 112 East Post Road, White 
Plains, New York, 10601. We welcome comments and criticisms, too. 



CORRELATION 



in essence 



example In 
paper-making 



example in 
nutrition 



simmiary 



Correlation analysis measures the strength of relationship between two 
or more variables. 

Thirteen kinds of tests on each of a great number of rolls of paper 
reduce to four tests because of correlation analysis. Ten of the tests 
were so highly correlated (when one had high values, the others did too; 
when one had low values, so did the others) that nine of the original 
tests could be dispensed with. (Ref. Draper, p. 315) 

Correlations among 184 nutritional, metabolic, biochemical, and 
physiological variables (vitamin intake, mineral intake, amounts ab- 
sorbed, characteristics of subjects, etc. ) revealed important inter- 
relations. Significant insights came from observii^ how the correlations 
changed as the experiment progressed in time. (Ref. James) 

From repeated measurements on a number of variables , correlation 
analysis measures the strength of association between every pair of 
variables. Degree of association runs from +1. (perfect), down 
through 0.0 (no correlation), to -1. (perfect inverse association). You 
search for high correlations (actually, either high direct or high inverse) 
in the paper-making example, to cut out superfluous tests. In the 
nutrition example, high correlations point to possible cause and effect 
relations (although high correlation need not, and usually does not, 
imply cause and effect situations). Many varieties of correlation mea- 
sures exist. 
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CORRELATION 
ANALYSIS 



VI V2 


V3 
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Input: 

tepeated measures on a set of vari- 
ables VI, V2, V3, V4. These fovit 
variables could be measures of height, 
weight, pulse, and blood pressure on 
individuals Ml, M2, . . . 



Output: 

correlations, a high 0.9 for VI and 
V2; a low 0. 1 for V2 and V4; a 
high inverse -0. 8 for V3 and V4. 
Now highly correlated variables can 
be collected and/or some eliminated. 
Searches can be made for possible 
causes and effects, when appropriate. 



applications 



catalog of 
programs 



use of IBM 
systems 



A few applications of correlation analysis: 

Subject Reference 



earth sciences 
psychotropic problems 
paper testing 
nutrition, metabolism 
geochronology 
educational measurements 
heart pathology 



Miller, ch. 13 

Hall 

Draper, p. 315 

James 

Martin 

Cooley, p. 21 

ToUes 



Programs available from "IBM Catalog of Programs" (Ref . IBM) in 
correlation analysis and related areas: 

Program 
Subject Computer Form Number 



correlation 


7070 


7070-11.3.005, 


.008 




correlation 










programs 


1620 


1620-06.0.013, 


.015, 


.021 






.022, 


.038, 


.039 






.040, 


.051, 


.064 






.097, 


.104, 


.114 






.121, 


.125, 


.162 






.170, 


.175, 


.188 






.205 






correlation 


1401 


1401-06.0.005, 


.006 




correlation 


360 


360A-CM03X 








1130 


1130-CM02X 







Citations in trade journals, periodicals and texts on the use of IBM 
systems in carrying out correlation analysis: 



Subject 

correlation pairing 
tetrachoric correlation 
correlation, health of aviators 
correlation 
biserial correlation 
BIMED programs (14) 
nutritional relationships 
geochronology 



Computer 



Reference 



709/0 


Priest 


709/0 


Castellan 


1620 


Osborne 


709 


Sorenson 


709/0 


Castellan (2) 


7090 


Massey 


1620 


James 


7094 


Martin 



FACTOR ANALYSIS 



in essence 



example in 
advertising 



example in 
credit 



example in 
psychology 



summary 



Factor analysis finds new, more fundamental quantities (the factors) 
underlying measured variables. 

In an advertising effectiveness study, 20 variables (size of ad, colors, 
type sizes, copy blocks, product facts, benefits, pictures, readership, 
etc. ) bunch into 6 groups or factors. One factor related to the pic- 
torial and color variables, a second to ad size variables, a third to 
typography variables, a fourth to information variables, etc. (Ref. 
Ferber, p. 101) 

A factor analysis of 22 questions on a credit request revealed 6 basic 
factors: two connected with questions on the transaction, and four 
related to questions on personal history. (Ref. Myers) 

Some of the 48 scores (variables) on a Rorschach test were too much 
dependent on the number of responses given. Factor analysis into one 
general factor (interpreted as productivity) and 15 other factors, inde- 
pendent of the general factor, removed the difficulty. (Ref. Cooley , 
p. 164) 

With repeated measurements on a set of variables, factor analysis 
discerns imderlytng factors distinct from, and fewer in number than, 
the original variables. Some factor analytic methods seek out a general 
factor present in all original variables (the Rorschach example). In 
other methods, original variables are grouped so as to depend on a few 
distinct factors (advertising and credit examples) . 

Formulas \(rhich show each variable as a combination of the factors are 
important outputs of factor analysis. For example, in the advertising 
illustrations above, the variable representing "number of words" equals 
0.05 F1+ 0.46 F2 - 0.10 F3 + 0.71 F4 +0.00 F5 - 0.03 F6. 
The 6 variables, which like this one had strong contribution from F4, 
all seem to concern the "information factor" in the ads. 
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FACTOR ANALYSIS 



LOADINGS ON 


FACTORS 


Fl AND F2: 




VI = 


.68 Fl + 


.I8F2 


V2= 


.92 Fl + 


.06F2 


V3= 


.02 Fl + 


.80F2 


V4= 


.11 Fl + 


.76 F2 



repeated measurements of variables 
VI, V2, V3, V4,(e.g., size, width, 
coloc, weeds tallied for a series of 
ads Ml, M2, etc. ). 



Output: 

formulas for four variables in terms 
of fewer factors. VI and V2 are 
based mainly on Fl; V3 and V4 on 
F2. You now have fewer factors to 
deal with. Hopefully they are more 
meaningful than the original four 
variables. 



applications A few applications of factor analysis: 



Subject 



Keference 



metallurgy, steels, blastfurnace 

psychology, multiple time series 

metropolitan economy 

industrial relations 

consumer attitudes 

psychology 

physical fitness tests 

psychological data 

tests on paper products 

refinery process 

botanical applications 

language ability 

listening tests 

retail credit 

census data 

earth sciences 

Rorschach test (48 dimensions) 



Spurrell 

Anderson 

Carleton 

Boehr 

Adams 

Overall 

Falls 

Schonemann 

Draper, p. 316 

Thomas 

Pearce 

Weaver 

Bateman 

Myers 

Massey, W.F. 

Miller, ch. 13 

Cooley, p. 164 



catalog of Programs available from "IBM Catalog of Programs" (Ref. IBM) in 

programs factor analysis and related areas: 

Program 
Subject Computer Form Number 



factor analysis 
factor analysis 

factor analysis, 
varimax 



7070 7070-11.3.005, .008 

1620 1620-06.0.053, .091, .094, 

.103, .145, .169 
360 360A-CM 03X 

1130 1130-CM 02X 



use of IBM Citations in trade journals, periodicals, and texts of the use of IBM 

systems systems in carrying out factor analysis: 

Subject 

factor analysis, oblique 
multivariate statistics 
nonlinear factor analysis 
factor analysis, rotation 
principal axes 
principal components 
BIMD programs (14) 
convulsive disorders 
principal components 
factor analysis, square root 
principal components 
factor analysis (3 mode) 
principal axes 
psychological data 
language ability 



Computer 


Reference 


7094 


Hendrickson 


7090/4 


Jones 


7090/4 


McDonald 


7090 


Wolf 


650 


Burket 


7070 


Bendig 


7090 


Massey, F. 


704 


Rodin 


7090 


Steidler 


7090 


Lingoes 


1620 


Moore, D.W 


709 


Walsh 


709 


Burket 


7094 


Schonemann 


709 


Weaver 



CLUSTER ANALYSIS 

in essence Cluster analysis groups items or individuals by means of their 

characteristics. 



example in 
bacteriology 



example in 
education 



summary 



Cluster methods applied to bacteria have grouped them into their proper 
orders, or grouped strains within a species. Classification of an 
atypical plague bacillus later proved accurate. (Ref . Sokal, p. 259) 

Attitudes toward mathematics were thought to have a significant relation 
to general personality variables. In the study which confirmed this 
hypothesis, 42 personality variables (dominance, sociability, tolerance, 
etc.) clustered as follows. Variables 1, 2, 3, 7, 9, 10 constituted a 
"extroversion" group. Variables 4,5,6, 7, 8, 10, a "conscientiousness" 
cluster. Variables 5, 7, 11, 14, a "self-control" cluster, etc. 
(Ref. Aiken) 

From characteristics of individuals in a large, unorganized collection, 
cluster analysis groups similar individuals together. Among the many 
variants of cluster methods, certain ones place the clusters into 
hierarchies according to degree of similarity (the bacteriology reference 
above). The technique also can cluster attributes (variables) and tiius is 
similar in purpose to factor analysis. 
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CLUSTER 
ANALYSIS 



.* .4 



f 



I ' SIMILARITY 
t if} CLUSTER 1 

4-"> CLUSTER 2 
CLUSTER 3 

CLUSTER 4 



mi 



Input: 

four characteristics CI to C4 mea- 
sured on nine items. For example, 
length, width, wing span, and 
antenna length axe measured on 
nine insects. 



Outputs 

at about . 6 on a degree of similarity 
scale, four clusters emerge. Cluster 1 
contains items 12, 17, 16. Cluster 2 
has only 13, etc. At . 2 on the simi- 
larity scale, only two clusters emerge: 
cluster 4 and a coalescence of clusters 
1, 2, 3. You might be searching for 
a genus at similarity =. 2; or for a species 
at similarity =. 6; or strains at 
similarity =. 9. 



applications A few applications of cluster analysis: 



Subject 



Reference 



botany (rice, manioc, farinosae) 

paleontology (fish) 

bateria (soil, plage, actinomycetes) 

viruses, genes, protein patterns 

plant ecology, soil classification plankton 

physical anthropology 

medical diagnosis 

rock identification 

oil exploration 

legislative voting patterns 

language translation 

pattern recognition 

archeology 

library, document classification 

philology, authorship tests 

leukemia classification 

information retrieval 

zoology (bees, mosquitos, man, cats, sponges) 

earth sciences 

math and personality 



Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Sokal 

Whitfield 

Sokal 

Sokal 

Miller 

Aiken 



catalog of Programs available from "IBM Catalog of Programs" (Ref . IBM) in 

programs cluster analysis and related areas: 

Program 
Subject Computer Form Number 



optimal clustering 
continuous variables 
binary variables 



nonnumeric 



7092 7092-G2 IBM0026 

7090 7090-ZO IBM0015 

7090 7090-ZO IBM0002 

1620 1620-06.0.201 



use of IBM Citations in trade journals, periodicals and texts of the use of IBM 

systems systems in carrying out cluster analysis: 



Subject 



Computer Reference 



taxonomic classification 
disease classification 
cluster analysis 
pattern recognition 



7090 


Lingoes (4) 


7090 


Bonner (2) 


1620 


Hyvarinen 


7090 


Bonner 



REGRESSION 

in essence 

examples in 
metallurgy 



Regression analysis computes prediction formulas from data. 

Regression related the Charpy fracture temperature to 11 variables 
(percent carbon C, manganese Mn, phosphorus P, etc. ) toy the formula 

(311.7)C + (133.6)Mn+ (1194.7)P + (57.5)Si - (26.4)Ni+ 

(Ref. Pehlke, p. 58) 



example in Regression foimd equations to predict hospital beds needed in various 

hospitals case categories. For example, neurology cases per month = 6. 65 + 

(0, 635)X17 + (3.47)X18 where variables X17 is Indiana deaths and X18 
Indiana births. One hundred seventeen variables (births, accident rates, 
number of specialists, etc.) were screened. (Ref. Beenhakker) 

example in The method of regression foimd the explicit formula for milk output 

agriculture as a function of XI = concentrate feed, X2 = hand feed, X3 = grassland 

acre days, and X4 = average cows in herd. The decimal exponents on 
each of the variable XI thru X4 can be handled hy using logarithms. 

milk output = 39.80(Xl)°-24 (X2)^-15 (X3)^-^2 (X4)^-^^ 

(Ref. Cowling) 

summary Starting with data on several variables and an idea of the general 

formula connecting the variables, regression analysis finds the specific 
numerical formula relating the variables. Statements can usually be 
made about how confident you are in the formula and in predictions using 
it. For example, in the agriculture example above, about 75 percent of 
the variation in the data is explained through the formula for milk output. 
Reduction of masses of data to simple equations makes regression im- 
portant. Two other advantages are understanding the nature of a process 
and prediction of future events. 



VI V2 V3 

Dl _ _ 

D2 _ _ 
D3 

FORMULA:V3=A+B(VI)+C(V2) 



REGRESSION 
ANALYSIS 



data on three variables VI, V2, V3. 
Also, the general farm of one in 
terms of others. 



F0RMULA:V3=0.1+1.2{V1) 
+0.6(V2) 
r2 = 0.76 
F-VALUE = 4.06 
STANDARD ERROR 8=0.1 
STANDARD ERROR C=0.02 



Outputs: 

the numeric coefficients in the 
regression formula: A=0. 1, B=l. 2, 
C=0. 6. Also, certain quantities 
used to measure confidence in the 
formula or in the coefficients. 
R^O. 76 means 76 percent of liie 
variation in V3 is explained fay tahe 
equation in VI and V2. The for- 
mula has summarized much data; 
it can be used to predict a new V3 
ftomVl andV2. 



applications A few applications of regression analysis: 



Subject 

paper industry 

agriculture, milk production f mictions 

weather forecasting 

educational measurements 

meteorology 

bank debits as an indicator 

construction economics 

urban transportation 

U.S. import demand 

wartime production 

urban refuse collection 

open hearth production 

earth sciences 

gasoline production 

psychological tests, canonical correlation 

psychological scores 

forging, metallurgy, lattice parameters 

economic model of United Kingdom (simultaneous) 

inertial navigation, error sources 

aerodynamics, downwash 

dairy production 

X-ray fluorescence 

foundry steels 

hospital bed needs 



Reference 

Moore, P.G. 
Cowling 
Glahn 

Cooley, p. 38 
Lund 
Carleton 
Dilbeck 
Kain 
Reimer 
Rapping 
Hirsch 
Leckle 

Miller, ch. 8, 
9, 17 

Ostle, p. 167, 
183 

Meredith 
Meredith (2) 
Pehlke 
Ball 
Eisner 
Fromme 
Jarrett 
Alley 
Sprinkle 
Beenhakker 



catalog of Programs available from "IBM Catalog of Programs" (Ref . IBM) in 

programs regression analysis and related areas: 

Program 
Subject Computer Form Number 



regression 


650 


0650-00.0,056 






regression 


705 


0705-11.3.001 






regression 


1410 


1410-11.3.001, 


.002 




regression 


7070 


7070-11,3.001, 


.007, 


.011 


regression 


1620 


1620-06.0,001, 


.003, 


.006, 






.031, 


,042, 


.049, 






.057, 


.066, 


.077, 






.084, 


.101, 


,118, 






.120, 


,122, 


,142, 






.143, 


.154, 


,157, 






.159, 


,168, 


.173, 






.181, 


.187 
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Program 


Subject Computer 


Form Number 


nonlinear regression 


704 




0704- 


■G2 3226N11 


iterative least squares 


709 




0709- 


■E2 3024LSQ 


orthogonal polynomials 


709 




0709- 


-E2 3197CF 


systems of equations 


709 




0709- 


-Fl 3090NORM 


mortality curves 


7040 




7040-Gl 3356MORT 


nonlinear least squares 


7040 




7040- 


-02 3094T,TN 


stepwise regression 


7040 




7040- 


-G2 3205RRG 


polynomial fit 


7090 




7090- 


-E2 3289 PLYF 


regression 


7090 




7090- 


-G2 3104RGNL 


stepwise regression 


7090 




7090- 


-G2 3143MPR2, MPR3 


differential equations 


7090 




7090- 


-G2 3146NLR 


constrained regression 


7094 




7094- 


-E2 3363BJ06 


polynomial least squares 


7094 




7094- 


-E2 3372AM26 


regression 


1401 




1401- 


-06.0.002, .003, .004, 
.005, .007, .008 


regression subroutines 


360 




360A 


-CM 03X 


(multiple, polsmomial. 


1130 




1130- 


-CM 02X 


canonical) 










use of IBM Citations in trade journals, periodicals, 


and texts of the use of IBM 


systems systems in carrying out regression analysis: 




Subject 


Computer 




Reference 


resonance spectra 




1410 




Nelson 


multivariate programs (30) 


7090/4 




Jones 


regression prediction 




7090 




Lingoes (2) 


decay data 




0650 




Worsley 


simultaneous regression 




7090 




Lingoes (3) 


psychophysiology 




1620 




Williams 


multiple regression 




7090 




Steidler 


heart disease 




0650 




Ward 


tomato factors 




7070 




Mittler 


chemical analysis 




0650 




Winchell 


BIMD programs (14) 




7090 




Massey, F. 


generalized regression 




0704 




Eisenpress 


metallurgy 




0704 




Pehlke 


aerospace naviation 




7090/4 




Eisner 


hospital bed prediction 




7090 




Beenhakker 
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DISCRIMINATION 

in essence Discriminant analysis assigns individuals to known groups. 

example in Four measurements, XI, X2, X3, X4, taken on each of many human 

anthropology and chimpanzee fossil teeth yielded a discriminant function. 

D = XI - (7.49)X2 + (2.34)X3 + (4.70)X4 

with D averaging -5. ± 2.45 for human teeth and averaging +17. 6 ± 2.45 
for chimpanzee teeth. The Taungs skull, with a D = -7.9, was subse- 
quently classified as probably human. (Ref. Keepii^, p. 366) 

example in Three hundred words in each of three languages — English, Swedish, 

linguistics and Finnish — were treated by discriminant analysis. Quantitative 

measures (e. g. , number of A' s, of B's of syllables, etc.) 

were taken on each word. A first discriminate function separates 
Finnish from the other two; a second fimction distinguishes English and 
Swedish. (Ref. Mustonen) 

summary Starting from known groups of individuals, each individual with measured 

characteristics, discriminant analysis derives the so-called discriminant 
functions. They allot the originally given individuals to their proper group 
and new individuals to an appropriate group. The functions can't 
always cleanly distinguish the groups as they did in the linguistics 
example. A misclassification percent measures the effectiveness of 
the discriminant functions in allocating the original individuals Into 
their known groups. Cluster analysis differs from discriminant analysis 
in that cluster analysis discovers groups, whereas discriminant analysis 
begins with recognized groups. 



CI C2 C3 



GROUP! 



/n - _ _ 

112 __ _ 



13 _ 

GROUP 2^14 _ 
15 _ 



DISCRIMINANT 
ANALYSIS 



THE DISCRIMINANT 
FUNCTION FOUND IS 
F1=2.6 + I.I(CI) + 

0.9(C2)-7.0 (C3) 
WITH AVERAGE VALUE 
FOR GROUP 1 = 18.2 ±.9 
GR0UP2=-1.3± .9 



Inputs: 

the five individuals II to 15 fall into 
two known groups. Characteristics 
CI, C2, C3 are measured on each 
individual. 



Outputs: 

the discriminant function serves to 
classifyr a new individual into his ap- 
propriate group on the basis of his 
characteristics. 
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applications 



catalog of 
programs 



use of IBM 
systems 



A few applications of discriminant analysis: 

Subject Reference 



judgments in cooking 

brand loyalty 

coal analysis 

earth sciences 

educational methods 

weather prediction 

textile research 

medicine 

soil differentiation 

lii^uistic problems 

biological taxonomy 

U. S. National Health Survey 

physical anthropology 

meteorlogy 

basalic lava discrimination 



Baten 

Farley 

Baten (2) 

Miller, ch. 12 

Baten (3) 

Glahn, p. 122 

Baten (4) 

Radhakrishna 

Cox 

Mustonen 

Burnaby 

Fisher 

Ashton 

Lund 

Chayes 



Programs available from "IBM Catalog of Programs" (Ref. IBM) in 
discriminant analysis and related areas: 

Program 
Subject Computer Form Nimiber 



classification 
discrimination 
discrimination routines 



1620 1620-06.0.076, .201, .208 

7094 7094-BMD 04M, 05M 

360 360A-CM 03X 

1130 1130-CM 02X 



Citations in trade journals, periodicals and texts of the use of IBM 
systems in carrying out discrimination analysis: 



Subject 

classification 
multivariate statistics 
health surv^ 



Computer Reference 



7094 
7090/4 
704 



Kossack 

Jones 

Fisher 
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EXPERIMENTAL DESIGN AND ANALYSIS 

in essence Experimental design and analysis methods decide whether various 

factors or combination of factors influence a result. 



example in 
aluminum 



example in 
ceramics 



summary 



The impact extrusion of aluminum turned out to be influenced by 7 
alloy variations, 4 process variables, and 3 annealing temperatures. 
(Ref. Pehlke) 

The physical properties of a semivitreous body were shown by experi- 
mental design to depend on such factors as particle size, water, 
entrapped air, thickness, firir^ rate. The water, air, and thickness 
variables appeared linear; particle size was nonlinear. Combinations 
between levels of particle size and other factors caused important 
changes. (Ref. Conrad) 

One starts with measured outcomes for various combinations of 
influencing factors (many at preselected values, some simple observed). 
Experimental design and analysis techniques sort out the important 
factors or combination of factors producing the outcome. A host of 
"designs" are used. Randomization and repetition of runs insure con- 
fidence in results. 



n 

F2 
F3 


MOC 


1 2 


12 12 12 


)El:d=main effect + 

F1 + F2+F3+F1»F2+E 



EXPERIMENTAL 
DESIGN AND ANALYSIS 



GRAND MEAN:67.2 | 


SOURCE OF 


DEGREES OF 


MEAN 


VARIATION 


FREEDOM 


SQUARES 


F1 




63.1 


F2 




262.8 


F3 




50.2 


F1*F2 




85.1 


ERROR 


11 


102.9 


TOTAL 


15 





Inputs: 

diree factxars Fl, F2, F3, each with two 
levels (Fl at -5 and 4-5 for example) ate 
examined for their effect on outcome D, 
both separately and in F1*F2 combina- 
tion. These separate contributions are 
over and above an average main effect 
and error, E. The outcome D is 
measured twice (replicated). 



Outputs: 

the so-called "mean squares" and 
"degrees of freedom" determine vAetfaer 
or not the factors or combinations really 
affect the outcome. Other models in- 
cluding new factors or triple combina- 
tions Fl*F2fF3 can be tried. 
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applications A few applications of experimental design and analysis: 



Subject 



Reference 



testily rocket engines 

geology, paleontology, etc. 

electric machinery electrodes 

psychological experiments 

machinery metals 

psychotropic problems 

transistor industry 

aluminum impact extrusions 

castii^s 

textiles, cotton spinning 

general review 

steels, Ausforming process 

chemical tests of textiles 

paper pack seals 

soaps, R/D, manufacturing, marketing 

U.S. Patent Office, transistor circuits 

automobile purchasing 

ceramics 

hardening of steels 

etc. 



Wood 

Miller, ch. 7 

Hicks 

Caian 

Hamaker 

Hall 

Hamaker 

Pehlke 

Brownlee 

Peake 

Htmter 

Duckworth 

Bainbridge 

Moore, p. 307 

Michaels 

Bryant, p. 204 

Jung 

Conrad 

Hopkins, A. D. 



catalog of Programs available from "IBM Catalog of Programs" (Ref. IBM) in 

programs experimental design and related areas: 

Program 
Subject Computer Form Number 



variance analysis 


7070 


7070-11.5.002 






covariance analysis 


1620 


1620-06.0.023, 


.024, 


.025, 






.032, 


.080, 


.092, 






.107, 


.109, 


.129 


analysis of variance 


1620 


1620-06.0,026, 


.027, 


.028, 






.029, 


.030, 


.033, 






.041, 


.043, 


.060, 






.061, 


.062, 


.065, 






.069, 


.070, 


.081, 






.083, 


.086, 


.087, 






,088, 


.089, 


.102, 






.105, 


.113, 


.123, 






.132, 


.139, 


.140, 






.152, 


.161, 


.174, 






.176, 


.202, 


.207, 






.210, 


.213 




variance, covariance 


7040 


7040- G2 3365 ANOV 




analysis of variance 


7090 


7090- G4 3027 ANAWZ 




factorial analysis 


7094 


7094-G4 3337 ANV 




factorial analysis 


1401 


1401-06.0.012, 


.014 




factorial design 


360 
1130 


360A-CM-03X 
1130-CM-02X 
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use of IBM Citations in trade journals, periodicals, and texts of the use of IBM 

systems systems in carrying out experimental design and analysis: 



Subject 

variance, mean difference 
paired comparisons 
electroencephalogram , 

anova 
Latin squares 
incomplete factorial 
variance, covariance 
multivariate statistics 
analysis of variance 
analysis of variance 
BIMD programs 
factor analysis 
analysis of variance 
variance, covariance (7) 



Computer Reference 



7072 


Turk 


7090 


Gulliksen 


7090 


Sorkin 


7090 


Gilbert, E.N, 


7094 


Webb 


7090 


Finn 


7090/4 


Jones 


7074 


Hemmerle 


7070 


Bendig 


7090 


Massey, F. 


1410 


Cientat 


709/90 


Hopkins 


1401 


Sterling 
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EVOLUTIONARY OPERATION 

in essence Evolutionary operation resets variables step by step to make a process 

better. 

example in A high-cost organic chemical process has yields depending on two 

chemicals important factors: reaction time and weight of additives. Starting 

from time = 75 minutes and weight = 120 pounds , evolutionary operation 
gradually improved yield from 562 to 588 poimds by adjusting time to 
25 minutes and weight to 110 pounds. (Ref , De Busk) 

example in In a catalytic crackii^ application four variables — feed rate, reactor 

petroleum temperature, recycle, and space velocity — were adjusted to msiximize 

cataljrtic distillate yield and catalytic light gas oil yield. The levels 
and order of successive runs were imposed by operating personnel. 
(Ref. Klingel) 

summary Given a process whose yield (or some other response) is a function of 

a number of factors, evolutionary operation tries small changes in two 
or three controlling factors to improve the 3rield gradually. The slight, 
deliberate changes in process variables must cause some improved 
response over and above mere random fluctuation. 



PHASE 6 


CYCLE 10 


RESPONSE AT 


RESPONSE 


5 SETTINGS 


IMPROVEMENT 


Fl 


DUE TO: AMOUNT 


46 47 


FT .6 ±.9 


X«^ 


F6 1.9 ±.9 


45 
42-^ ^43 


F1*F6 .1 ±.9 


MEAN .3 db.8 


^^ p 


6 





EVOLUTIONARY 
OPERATIONS 



PHASE 6 


CYCLE 11 


RESPONSE AT 


RESPONSE 


5 SETTINGS 


IMPROVEMENT 


Fl 


DUE to: AMOUNT 


48 49 


Fl -.2 ±.6 


\ / 


F6 .8±£ 


y'\. 


F1HF6 .1 ±.6 


MEAN A^i 


^ ere 







Inputs: 

a piocess has been operating during 
previous phases (wiUi other settings for 
factors, or other factors, or another re- 
sponse variable). Factor F6 at its high 
level causes the response to exceed mere 
"noise" (±. 9). The process was then 
^fted permanentlyto this high value of 
F6 and evolutionaiy operations begun. 



Outputs; 

response is measured at settings of Fl 
and F6, around the central operating 
point. Again the high setting of F6 im- 
proved tiie response. Fl did not affect 
the response; however, next cycle 
aroimd, Fl's operating limits were 
spread widerto discover its influence. 
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applications A few applications of evolutionary operation: 

Subject Reference 



geological fades maps 

alloying process, quality control 

textiles, spinning machinery 

chemical process 

catalytic cracking 

response surfaces, general 

cutting-tool equation 

chemical process 

chemical plants 

quality in plastic processes 



Miller, p. 394 

Bingham 

Dudley 

De Busk 

Klingel 

Bradley 

Wu 

Kochler 

Box 

Harrington 
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BAYES FORMULA 

in essence Bayes formula finds probable causes from measured effects. 

example in Medical knowledge is available in the form of probabilities of a complex 

medicine of symptoms arisii^ from a given disease complex. Also known are 

tiie frequencies of occurrence of the disease complexes. Bayesian 
techniques discover the most probable disease complex from a newly 
presented set of sjntnptoms. (Ref. Ledley, Sect. 12-4) 

example in In sample and quality control there are subjective estimates for the 

quality control chances that different fractions of a lot are defective. On the balsis 

of new information from samples, Bayesian analysis revises these 

subjective estimates. (Ref. Locke) 

summary Given the probabilities that certain effects follow certain causes (based 

on historical data); and given the frequency of occurrence (or probabil- 
ities) of the causes (based on historical or subjective knowledge) . Use 
of Bayes formula finds the probability of a cause behind a newly pre- 
sented effect. Strictly speaking, Bayesian statistics or Bayesian 
decision theory is a quite wide field. Here we narrow attention to 
using Bayes formula to deal with what can be looked upon as causes (C) 
and effects (E). The probability, P (Ci|Ej), of a cause Ci given the 
presence of an effect Ej is 

P(CilEi)= P(Ci)P(EjlCi) 

Sum (P(Ci) P(Ej|Ci)) 

over all 
diseases, i 

In the medical example, symptoms and diseases take the place of effects 
and causes. In the quality control example, sample information (effect) 
revises subjective estimates of the fraction (cause) of lot defectives. 



P(E0/C0)=.9 P(E1/C0)=-»» 
P(E0/C1)=.2 P(E1/C1)=..»» 
P(E0/C2)=.0 P(EVC2K»» 



P(C0)=.9 P(C1) = .7.., 



BAYESIAN 
ANALYSIS 



P(C0/E0)=.2 P(C0/E1)=., 
P(C1/E0)=.l P(C1/E1)=,, 
P(C2/E0)=.4 P(C2/E1)=». 



Inputs: 



Outputs: 



P(£l/C2) means tihe probability of 
effect, £1 given existence of cause, 
C2. These Idnds of probabilities are 
known, together with probabilities of 
causes. F(C2) is that of cause, C2. 



probabilities of causes from a knowl- 
edge of effects. There is a probability 
of , 4 that cause C2 underlies effect 
£0. Most probable causes for given 
effects are evident from a scan of a 
column of probabilities. 
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applications A few applications of Bayesian analysis: 



Subject 



Reference 



reliabilHy, quality control 

sequential life testing 

quality control 

competitive bidding, equipment selection 

clinical trials 

statistical decision 

quality control 

marketing research costs 

medical diagnosis 

hospital engineering 

psychiatric classification 

control theory 

criminalistics 

military information processing 



Schafer 

Ginsburg 

Locke 

Peterson 

Novick 

Greyson 

Hamburg 

Bass 

Ledley, Sect. 12-4 

Aitchison 

Blmbaum 

Ho 

Kii^ston 

Kaplan 
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TIME SERIES ANALYSIS 

in essence Time series methods analyze, compare, and predict quantities which 

change in time. 

example in Ten years of monthly data on hog production reveal , through time series 

agriculture methods, a major 12- month cycle, a medium 6- month cycle, and a 

minor 4- month cycle. The model accoimts for a shifting of peaks and 

troughs over the ten-year period. {Ref . Abel) 

example in Of 800 time series examined for their relationship to a general business 

economics cycle, 21 series were selected as statistical indicators. Eight led the 

general cycle, 5 lagged behind it, and 8 coincided with it. The predictive 
ability of the indicators was satisfactory. (Ref. Chou, P. 566) 

example in To discover underlying geologic structures in inaccessible or covered 

geophysics regions, aeromagnetic maps are taken. Autocorrelation and spectral 

analysis techniques discover dominant trends, faults, and lithologic 

periodicities. (Ref. Horton) 

summary With measurements over time on one or several quantities, time 

series techniques (1) smooth out the quantity (weighted averages); 
(2) extract important trend-cycle-seasonal ingredients (seasonal ad- 
justment); (3) discover pure cyclic ingredients (harmonic analysis); 
or (4) compare pairs of series (cross-correlation). Prediction is a 
major objective. The agriculture example above is a use of harmonic 
analysis. The other two examples relate to cross- and auto-correlation. 
The geophysics Illustration substitutes variables as functions of distance 
for variables depending on time. 





Q1 


Q2 


Q3 


Q4 


Tl 


_ 


_ 


^ 


^ 


T2 


_ 


_ 


— 


— 


T3 


_ 


_ 


_ 


^ 


T4 


— 
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TIME SERIES 
ANALYSIS 



1) MOVING AVERAGES 
ON Ql 

2)SEAS0NAL ADJUSTMENT 
ON Q2 

3) HARMONICS OF 03 

4)CR0SS CORRELATIONS 
Ql 04 



Inputs: 

one or several quantities, Ql, Q2, 
03, 04, are known over time. A 
variety of approaches in time series 
allow prediction of the quantities at 
future times. 



Outputs: 

(1) 20 day moving averages on Ql- 
mightbeg?. 8, 96. 2, 95.9, ... 

(2) 02 without seasonals might be 
15. 8, 17, 2, 18. 3, 16. 1, . . . 

(3) the cycle size of 03 might be 

. 2 for 7 day, . 5 for 14 day, . . , 

(4) for lags of 0, 1, 2, ... the 
cross correlation of Ql and 04 
might be . 98, . 80, . 67, . 20, . . 
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applications A few applications of time series: 



Subject 



Reference 



chemical process control, cum sum 

biological systems 

tracking (radar, sonar), exp. smoothing 

mobile radios reception, power spectra 

new-car sales forecasting 

stock prices, random walks 

consumer attitudes, regressions 

earth sciences 

electroencephalography, cross-correlation 

X-ray patterns, Fourier series 

hog production, harmonic analysis 

textiles, periodograms correlograms 

meteorology, power spectra 

inventory, production control, exponential 

economic cycles, spectral analysis 

marine profiles, spectral analysis 

labor force, employment, seasonal adjustment 

closing stock prices 

radar data 

aeromagnetic maps 



Truax 

Attinger 

Helms 

Gilbert, E. N. 

Dyckman 

Fama 

Adams 

Miller, Ch. 15 

Ledley, Sect. 10-2 

Pehlke 

Abel 

Foster 

Craddock 

Wagle 

Adelman 

Neidell 

Shiskin 

Brown 

Jenkins 

Horton 



catalog of Programs available from "IBM Catalog of Programs" (Ref. IBM) in 

programs time series analysis and related areas: 

Program 
Subject Computer Form Number 



seasonal adjustment 
autocovariance 
power spectrum 

seasonal adjustment 

business cycles 

econometric forecasts 

series decomposition 

seasonal adjustment 

time series 

time series routines 
(averaging exponential 
smoothing, Fourier 
analysis, auto- and 
cross-correlation) 



650 


0650-06.0.041 




7070 


7070-11.2.001, .002 




1620 


1620-06.0.005, .056, 


.126 




.133, .147, 


.166 


1620 


1620-06.0.054 




709 


0709- Gl 3103 BCA 




7090 


7090-GO 32419 FES 




7090 


7090-Cl 3144 TSDA 




1401 


1401-CA 04X and -06. 


0.015 


7094 


7094-BMD OlS, OSS 




360 


360A-CM 03X 




1130 


1130-CM 02X 
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use of IBM Citations in trade journals, periodicals, and texts of the use of IBM 

systems systems in carrying out time series analysis: 



Subject 


Computer 


Reference 


upper winds, smoothing 


7090 


Reiter 


autocorrelation 


7090 


Schmid 


geophysics 


7090 


Simpson 


economic time series 


7090 


Karreman 


power systems spectra 


0650 


Cooke 


filtering 


7094 


Whittlesey 


sales, ejqjonential smooth 


0704 


Whela.n 


X-ray patterns 


0709 


Pehlke 
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